-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add datasets and documentation #1
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
Would be happy to get your feedback on this prototype dataset card |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks really good to me.
src/squirrel_datasets_core/datasets/kaggle_casting_quality/README.rst
Outdated
Show resolved
Hide resolved
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good already! Dataset card is quite nice 👍
Depending on when this PR gets ready to merge, you may want to refrain from adding the driver .py files, since there will be a lot of changes with the new squirrel store and driver apis.
src/squirrel_datasets_core/datasets/kaggle_casting_quality/README.rst
Outdated
Show resolved
Hide resolved
@winfried-loetzsch Very cool! Would be nice to have the templates also in the devtool, so that other projects get this when rendering the skeleton. |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
There shall also be tests. |
Description
will give
Type of change
Checklist:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far everything LGTM. Thanks a lot for the work!
Also interesting to see that all the 'data cards' are in each dataset folder instead of in docs. Is there any reason to prefer this way?
except AttributeError: | ||
pass | ||
|
||
return drivers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will add squirrel_sources here in the future, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to exclude the sources if they point to our internal gcp buckets, but I think we could include some of the huggingface sources for example. Let's discuss this on Monday :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, left minor remarks.
docs/source/conf.py
Outdated
intersphinx_mapping = { | ||
"python": ("https://docs.python.org/3/", None), | ||
"sphinx": ("https://www.sphinx-doc.org/en/master/", None), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe for future: We can add https://squirrel.readthedocs.io here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know we have already created the page. Looks nice!
torchvision | ||
scipy | ||
pyspark==3.2.0 | ||
hydra-core>=1.1.0 | ||
hub |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In squirrel-datasets, we have different flavors for e.g. torchvision and hub. Do we have them all in a single reqs.in here on purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, let's discuss then in the group if it makes sense to have the flavors or not :)
|
||
# generate extras based on requirements files | ||
extras_require = dict() | ||
for a_extra in ["dev", "preprocessing", "torchvision", "hub"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove some of these extras if we dont want flavors
This was brought up by @ThomasWollmann, mainly to simplify the process of adding a dataset (have everything in one place) and be able to navigate in github and see the readme/dataset card while browsing through the implementations of the datasets |
TODO